Bugfix for batched gemv #2481

kose-y · 2024-08-28T04:55:09Z

Fix incorrect definition of m and n in gemv_strided_batched!

maleadt · 2024-08-28T05:48:40Z

Shouldn't m and n switch depending on trans, just like with other wrappers?

Lines 426 to 438 in 8b54f85

    
           m = size(A[1], trans == 'N' ? 1 : 2) 
        
           n = size(A[1], trans == 'N' ? 2 : 1) 
        
           lda = max(1,stride(A[1],2)) 
        
           incx = stride(x[1],1) 
        
           incy = stride(y[1],1) 
        
           Aptrs = unsafe_batch(A) 
        
           xptrs = unsafe_batch(x) 
        
           yptrs = unsafe_batch(y) 
        
           if CUBLAS.version() >= v"12.0" 
        
               $fname_64(handle(), trans, m, n, alpha, Aptrs, lda, xptrs, incx, beta, yptrs, incy, length(A)) 
        
           else 
        
               $fname(handle(), trans, m, n, alpha, Aptrs, lda, xptrs, incx, beta, yptrs, incy, length(A)) 
        
           end

Can you add a test that covers the case that doesn't work right now, and works after the change?

kose-y · 2024-08-28T12:28:17Z

No, according to the official cuBLAS documentation, definitions of m and n for gemm and gemv interfaces are different.

For gemm interfaces:

m: number of rows of matrix op(A) and C.
n: number of columns of matrix op(B) and C.

For gemv:

m: number of rows of matrix A.
n: number of columns of matrix A.

For gemv, they don't depend on op.

kose-y · 2024-08-28T12:29:04Z

I will try to add some tests this week.

kose-y · 2024-08-28T12:36:39Z

See also the gemv! function:

CUDA.jl/lib/cublas/wrappers.jl

Lines 378 to 384 in bbe625b

    
           m,n = size(A) 
        
           # check dimensions 
        
           length(x) == (trans == 'N' ? n : m) && length(y) == (trans == 'N' ? m : n) || throw(DimensionMismatch("")) 
        
           # compute increments 
        
           lda = max(1,stride(A,2)) 
        
           incx = stride(x,1) 
        
           incy = stride(y,1)

kose-y · 2024-08-29T02:26:40Z

@maleadt A similar bug was found on gemv_batched!, and it's also fixed. Tests have been added now.

maleadt · 2024-08-29T12:48:38Z

LGTM, let's just ping the original author of these functions: @lpawela

kose-y · 2024-09-09T19:40:49Z

@maleadt What is the status of this PR?

lpawela · 2024-09-09T22:03:25Z

It hangs on me, sorry. I'll have a look within a couple of days.

lpawela · 2024-09-11T20:06:00Z

I have problems launching tests on this patch.

      From worker 2:    Stacktrace:
      From worker 2:      [1] throw_api_error(res::CUDA.cudaError_enum)
      From worker 2:        @ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
      From worker 2:      [2] check
      From worker 2:        @ ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:37 [inlined]
      From worker 2:      [3] cuMemFreeAsync
      From worker 2:        @ ~/lib/CUDA.jl/lib/utils/call.jl:34 [inlined]
      From worker 2:      [4] free(mem::CUDA.DeviceMemory; stream::CuStream)
      From worker 2:        @ CUDA ~/lib/CUDA.jl/lib/cudadrv/memory.jl:87
      From worker 2:      [5] free
      From worker 2:        @ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:82 [inlined]
      From worker 2:      [6] #1102
      From worker 2:        @ ~/lib/CUDA.jl/src/memory.jl:708 [inlined]
      From worker 2:      [7] #context!#990
      From worker 2:        @ ~/lib/CUDA.jl/lib/cudadrv/state.jl:168 [inlined]
      From worker 2:      [8] context!
      From worker 2:        @ ~/lib/CUDA.jl/lib/cudadrv/state.jl:163 [inlined]
      From worker 2:      [9] _pool_free
      From worker 2:        @ ~/lib/CUDA.jl/src/memory.jl:707 [inlined]
      From worker 2:     [10] macro expansion
      From worker 2:        @ ./timing.jl:395 [inlined]
      From worker 2:     [11] pool_free(managed::CUDA.Managed{CUDA.DeviceMemory})
      From worker 2:        @ CUDA ~/lib/CUDA.jl/src/memory.jl:689
      From worker 2:     [12] release(::GPUArrays.RefCounted{CUDA.Managed{CUDA.DeviceMemory}})
      From worker 2:        @ GPUArrays ~/.julia/packages/GPUArrays/qt4ax/src/host/abstractarray.jl:42
      From worker 2:     [13] unsafe_free!
      From worker 2:        @ ~/.julia/packages/GPUArrays/qt4ax/src/host/abstractarray.jl:91 [inlined]
      From worker 2:     [14] unsafe_free!(xs::CuArray{Float32, 2, CUDA.DeviceMemory})
      From worker 2:        @ CUDA ~/lib/CUDA.jl/src/array.jl:94
      From worker 2:     [15] exit
      From worker 2:        @ ./initdefs.jl:28 [inlined]
      From worker 2:     [16] exit()
      From worker 2:        @ Base ./initdefs.jl:29
      From worker 2:     [17] #invokelatest#2
      From worker 2:        @ ./essentials.jl:892 [inlined]
      From worker 2:     [18] invokelatest(::Any)
      From worker 2:        @ Base ./essentials.jl:889
      From worker 2:     [19] (::Distributed.var"#118#120"{Distributed.RemoteDoMsg})()
      From worker 2:        @ Distributed ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:310
      From worker 2:     [20] run_work_thunk(thunk::Distributed.var"#118#120"{Distributed.RemoteDoMsg}, print_error::Bool)
      From worker 2:        @ Distributed ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:70
      From worker 2:     [21] (::Distributed.var"#117#119"{Distributed.RemoteDoMsg})()
      From worker 2:        @ Distributed ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:310
      From worker 2:    WARNING: Error while freeing DeviceMemory(1.562 KiB at 0x0000000302122a00):
      From worker 2:    CUDA.CuError(code=CUDA.cudaError_enum(0x000002bc))

when launching julia --project test/runtests.jl libraries/cublas

julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.4
NVIDIA driver 550.90.7

CUDA libraries: 
- CUBLAS: 12.6.0
- CURAND: 10.3.7
- CUFFT: 11.2.6
- CUSOLVER: 11.6.4
- CUSPARSE: 12.5.2
- CUPTI: 2024.3.0 (API 24.0.0)
- NVML: 12.0.0+550.90.7

Julia packages: 
- CUDA: 5.5.0
- CUDA_Driver_jll: 0.10.0+0
- CUDA_Runtime_jll: 0.15.1+0

Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7

1 device:
  0: NVIDIA GeForce RTX 3080 (sm_86, 7.857 GiB / 10.000 GiB available)

maleadt · 2024-09-12T09:46:29Z

julia>  CUDA.CuError(CUDA.cudaError_enum(0x000002bc))
CuError(CUDA_ERROR_ILLEGAL_ADDRESS)

The changes in this PR seem to triggering some illegal memory access.

maleadt · 2024-09-17T18:11:50Z

I'm seeing similar issues locally, but I'm having a hard time isolating the problem. Many times, the libraries/cublas test suite hangs on this PR, while only taking the gemv tests modified here doesn't reproduce the issue.

maleadt · 2024-09-18T09:44:34Z

Actually, some more testing today reveals that the illegal memory access I was seeing locally comes from a different test.

@lpawela I cannot reproduce the isolated failure of the libraries/cublas test suite you are seeing. Is this still the case on the latest version of this PR? Does it reproduce with just the gemv tests from this PR?

using CUDA.CUBLAS, GPUArrays
using CUDA, Test, LinearAlgebra

using Adapt

struct ArrayAdaptor{AT} end
Adapt.adapt_storage(::ArrayAdaptor{AT}, xs::AbstractArray) where {AT} = AT(xs)

test_result(a::Number, b::Number; kwargs...) = ≈(a, b; kwargs...)
test_result(a::Missing, b::Missing; kwargs...) = true
test_result(a::Number, b::Missing; kwargs...) = false
test_result(a::Missing, b::Number; kwargs...) = false
function test_result(a::AbstractArray{T}, b::AbstractArray{T}; kwargs...) where {T<:Number}
    ≈(collect(a), collect(b); kwargs...)
end
function test_result(a::AbstractArray{T}, b::AbstractArray{T};
                     kwargs...) where {T<:NTuple{N,<:Number} where {N}}
    ET = eltype(T)
    ≈(reinterpret(ET, collect(a)), reinterpret(ET, collect(b)); kwargs...)
end
function test_result(as::NTuple{N,Any}, bs::NTuple{N,Any}; kwargs...) where {N}
    all(zip(as, bs)) do (a, b)
        test_result(a, b; kwargs...)
    end
end

function compare(f, AT::Type{<:AbstractGPUArray}, xs...; kwargs...)
    # copy on the CPU, adapt on the GPU, but keep Ref's
    cpu_in = map(x -> isa(x, Base.RefValue) ? x[] : deepcopy(x), xs)
    gpu_in = map(x -> isa(x, Base.RefValue) ? x[] : adapt(ArrayAdaptor{AT}(), x), xs)

    cpu_out = f(cpu_in...)
    gpu_out = f(gpu_in...)

    test_result(cpu_out, gpu_out; kwargs...)
end

function compare(f, AT::Type{<:Array}, xs...; kwargs...)
    # no need to actually run this tests: we have nothing to compoare against,
    # and we'll run it on a CPU array anyhow when comparing to a GPU array.
    #
    # this method exists so that we can at least run the test suite with Array,
    # and make sure we cover other tests (that don't call `compare`) too.
    return true
end

testf(f, xs...; kwargs...) = compare(f, CuArray, xs...; kwargs...)


m = 20
n = 35
k = 13


@testset for elty in [Float32, Float64, ComplexF32, ComplexF64]
    alpha = rand(elty)
    beta = rand(elty)

    @testset "gemv" begin
        @test testf(*, rand(elty, m, n), rand(elty, n))
        @test testf(*, transpose(rand(elty, m, n)), rand(elty, m))
        @test testf(*, rand(elty, m, n)', rand(elty, m))
        x = rand(elty, m)
        A = rand(elty, m, m + 1 )
        y = rand(elty, n)
        dx = CuArray(x)
        dA = CuArray(A)
        dy = CuArray(y)
        @test_throws DimensionMismatch mul!(dy, dA, dx)
        A = rand(elty, m + 1, m )
        dA = CuArray(A)
        @test_throws DimensionMismatch mul!(dy, dA, dx)
        x = rand(elty, m)
        A = rand(elty, n, m)
        dx = CuArray(x)
        dA = CuArray(A)
        alpha = rand(elty)
        dy = CUBLAS.gemv('N', alpha, dA, dx)
        hy = collect(dy)
        @test hy ≈ alpha * A * x
        dy = CUBLAS.gemv('N', dA, dx)
        hy = collect(dy)
        @test hy ≈ A * x
        dy = CuArray(y)
        dx = CUBLAS.gemv(elty <: Real ? 'T' : 'C', alpha, dA, dy)
        hx = collect(dx)
        @test hx ≈ alpha * A' * y
    end

    if CUBLAS.version() >= v"11.9"
        @testset "gemv_batched" begin
            x = [rand(elty, m) for i=1:10]
            A = [rand(elty, n, m) for i=1:10]
            y = [rand(elty, n) for i=1:10]
            dx = CuArray{elty, 1}[]
            dA = CuArray{elty, 2}[]
            dy = CuArray{elty, 1}[]
            dbad = CuArray{elty, 1}[]
            for i=1:length(A)
                push!(dA, CuArray(A[i]))
                push!(dx, CuArray(x[i]))
                push!(dy, CuArray(y[i]))
                if i < length(A) - 2
                    push!(dbad,CuArray(dx[i]))
                end
            end
            @test_throws DimensionMismatch CUBLAS.gemv_batched!('N', alpha, dA, dx, beta, dbad)
            CUBLAS.gemv_batched!('N', alpha, dA, dx, beta, dy)
            for i=1:length(A)
                hy = collect(dy[i])
                y[i] = alpha * A[i] * x[i] + beta * y[i]
                @test y[i] ≈ hy
            end
            dy = CuArray{elty, 1}[]
            for i=1:length(A)
                push!(dy, CuArray(y[i]))
            end
            CUBLAS.gemv_batched!(elty <: Real ? 'T' : 'C', alpha, dA, dy, beta, dx)
            for i=1:size(A, 3)
                hx = collect(dx[i])
                x[i] = alpha * A[i]' * y[i] + beta * x[i]
                @test x[i] ≈ hx
            end
        end
    end

    if CUBLAS.version() >= v"11.9"
        @testset "gemv_strided_batched" begin
            x = rand(elty, m, 10)
            A = rand(elty, n, m, 10)
            y = rand(elty, n, 10)
            bad = rand(elty, m, 10)
            dx = CuArray(x)
            dA = CuArray(A)
            dy = CuArray(y)
            dbad = CuArray(bad)
            @test_throws DimensionMismatch CUBLAS.gemv_strided_batched!('N', alpha, dA, dx, beta, dbad)
            bad = rand(elty, n, 2)
            dbad = CuArray(bad)
            @test_throws DimensionMismatch CUBLAS.gemv_strided_batched!('N', alpha, dA, dx, beta, dbad)
            CUBLAS.gemv_strided_batched!('N', alpha, dA, dx, beta, dy)
            for i=1:size(A, 3)
                hy = collect(dy[:, i])
                y[:, i] = alpha * A[:, :, i] * x[:, i] + beta * y[:, i]
                @test y[:, i] ≈ hy
            end
            dy = CuArray(y)
            CUBLAS.gemv_strided_batched!(elty <: Real ? 'T' : 'C', alpha, dA, dy, beta, dx)
            for i=1:size(A, 3)
                hx = collect(dx[:, i])
                x[:, i] = alpha * A[:, :, i]' * y[:, i] + beta * x[:, i]
                @test x[:, i] ≈ hx
            end
        end
    end
end

lpawela · 2024-09-18T10:16:25Z

I still get the same error, even on a different machine. The command julia --project test/runtests.jl libraries/cublas

julia> using CUDA
CUDA.version
julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.2
NVIDIA driver 535.183.1

CUDA libraries: 
- CUBLAS: 12.6.1
- CURAND: 10.3.7
- CUFFT: 11.2.6
- CUSOLVER: 11.6.4
- CUSPARSE: 12.5.3
- CUPTI: 2024.3.1 (API 24.0.0)
- NVML: 12.0.0+535.183.1

Julia packages: 
- CUDA: 5.5.0
- CUDA_Driver_jll: 0.10.1+0
- CUDA_Runtime_jll: 0.15.2+0

Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7

1 device:
  0: NVIDIA GeForce RTX 3090 (sm_86, 22.477 GiB / 24.000 GiB available)

maleadt · 2024-09-18T10:19:38Z

Okay, thanks for confirming! Marked as draft until we figure out the exact issue here.

EDIT: does the isolated reproduces also give the same error?

amontoison · 2025-02-03T01:30:40Z

@kose-y Is it possible to rebase this PR?

kose-y · 2025-02-03T03:04:24Z

test/libraries/cublas/level2.jl

+                    push!(dy, CuArray(y[i]))
+                end
+                CUBLAS.gemv_batched!(elty <: Real ? 'T' : 'C', alpha, dA, dy, beta, dx)
+                for i=1:length(A)


Previous bug: This line was for i = 1:size(A, 3), where A is a Vector of CuMatrix.

Not sure if that is related to illegal memory access.

kose-y · 2025-02-03T03:05:16Z

I have attempted to rebase and, in the process, fixed a bug in the test code.

#2481 (comment)

codecov · 2025-02-03T03:22:17Z

Codecov Report

Attention: Patch coverage is 85.71429% with 1 line in your changes missing coverage. Please review.

Project coverage is 73.37%. Comparing base (69f3a76) to head (2e316ac).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
lib/cublas/wrappers.jl	85.71%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2481      +/-   ##
==========================================
+ Coverage   73.25%   73.37%   +0.12%     
==========================================
  Files         157      157              
  Lines       15272    15268       -4     
==========================================
+ Hits        11187    11203      +16     
+ Misses       4085     4065      -20

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2025-02-03T03:50:08Z

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.

diff --git a/test/libraries/cublas/level2.jl b/test/libraries/cublas/level2.jl
index ce65052c4..f93e79b04 100644
--- a/test/libraries/cublas/level2.jl
+++ b/test/libraries/cublas/level2.jl
@@ -77,7 +77,7 @@ k = 13
                     @test y[i] ≈ hy
                 end
                 dy = CuArray{elty, 1}[]
-                for i=1:length(A)
+                for i in 1:length(A)
                     push!(dy, CuArray(y[i]))
                 end
                 CUBLAS.gemv_batched!(elty <: Real ? 'T' : 'C', alpha, dA, dy, beta, dx)

amontoison · 2025-02-03T03:50:59Z

Thanks @kose-y!
Can you do another modification with a commit such that all tests are triggered?
The PR was a draft and only a subset of tests were executed.

kose-y · 2025-02-03T04:50:47Z

OK, here you go.

github-actions

CUDA.jl Benchmarks

Benchmark suite	Current: `2e316ac`	Previous: `24c236a`	Ratio
`latency/precompile`	`46304515857` ns	`46329110276` ns	`1.00`
`latency/ttfp`	`6988044874` ns	`6970088608` ns	`1.00`
`latency/import`	`3648625168` ns	`3628117061` ns	`1.01`
`integration/volumerhs`	`9622426` ns	`9622596.5` ns	`1.00`
`integration/byval/slices=1`	`146790` ns	`147068` ns	`1.00`
`integration/byval/slices=3`	`425208` ns	`425521` ns	`1.00`
`integration/byval/reference`	`144849.5` ns	`144904` ns	`1.00`
`integration/byval/slices=2`	`286097` ns	`286164` ns	`1.00`
`integration/cudadevrt`	`103234` ns	`103422` ns	`1.00`
`kernel/indexing`	`14150` ns	`14042` ns	`1.01`
`kernel/indexing_checked`	`14761` ns	`14520` ns	`1.02`
`kernel/occupancy`	`643.0441176470588` ns	`633.7076023391813` ns	`1.01`
`kernel/launch`	`2048.1499999999996` ns	`2101.9` ns	`0.97`
`kernel/rand`	`17039` ns	`14379` ns	`1.18`
`array/reverse/1d`	`19865` ns	`19719` ns	`1.01`
`array/reverse/2d`	`24149` ns	`25121` ns	`0.96`
`array/reverse/1d_inplace`	`10495` ns	`11160` ns	`0.94`
`array/reverse/2d_inplace`	`11919` ns	`13070` ns	`0.91`
`array/copy`	`21546` ns	`21024` ns	`1.02`
`array/iteration/findall/int`	`151894.5` ns	`155786` ns	`0.98`
`array/iteration/findall/bool`	`137497` ns	`134662.5` ns	`1.02`
`array/iteration/findfirst/int`	`153659` ns	`147148` ns	`1.04`
`array/iteration/findfirst/bool`	`153940` ns	`154167.5` ns	`1.00`
`array/iteration/scalar`	`60775` ns	`61499` ns	`0.99`
`array/iteration/logical`	`209734.5` ns	`203811.5` ns	`1.03`
`array/iteration/findmin/1d`	`39163.5` ns	`39639` ns	`0.99`
`array/iteration/findmin/2d`	`94362` ns	`94387` ns	`1.00`
`array/reductions/reduce/1d`	`41562.5` ns	`30507` ns	`1.36`
`array/reductions/reduce/2d`	`47023.5` ns	`51213.5` ns	`0.92`
`array/reductions/mapreduce/1d`	`39516.5` ns	`30459` ns	`1.30`
`array/reductions/mapreduce/2d`	`51401.5` ns	`51487` ns	`1.00`
`array/broadcast`	`20832` ns	`20729` ns	`1.00`
`array/copyto!/gpu_to_gpu`	`13807` ns	`11560` ns	`1.19`
`array/copyto!/cpu_to_gpu`	`210029` ns	`208930` ns	`1.01`
`array/copyto!/gpu_to_cpu`	`244002` ns	`241934` ns	`1.01`
`array/accumulate/1d`	`109245.5` ns	`108332` ns	`1.01`
`array/accumulate/2d`	`81223` ns	`80060` ns	`1.01`
`array/construct`	`1291.7` ns	`1258.3` ns	`1.03`
`array/random/randn/Float32`	`47413` ns	`42993` ns	`1.10`
`array/random/randn!/Float32`	`27000` ns	`26226` ns	`1.03`
`array/random/rand!/Int64`	`26953` ns	`26878` ns	`1.00`
`array/random/rand!/Float32`	`8504` ns	`8569` ns	`0.99`
`array/random/rand/Int64`	`38226` ns	`29957` ns	`1.28`
`array/random/rand/Float32`	`13232` ns	`12962` ns	`1.02`
`array/permutedims/4d`	`61212` ns	`61080` ns	`1.00`
`array/permutedims/2d`	`55505.5` ns	`55180` ns	`1.01`
`array/permutedims/3d`	`56559` ns	`55780` ns	`1.01`
`array/sorting/1d`	`2776294` ns	`2774685` ns	`1.00`
`array/sorting/by`	`3367591` ns	`3367411.5` ns	`1.00`
`array/sorting/2d`	`1085256` ns	`1085055` ns	`1.00`
`cuda/synchronization/stream/auto`	`1022.0666666666667` ns	`1048.8` ns	`0.97`
`cuda/synchronization/stream/nonblocking`	`6388.4` ns	`6455.4` ns	`0.99`
`cuda/synchronization/stream/blocking`	`807.9901960784314` ns	`846.986301369863` ns	`0.95`
`cuda/synchronization/context/auto`	`1193.5` ns	`1230.1` ns	`0.97`
`cuda/synchronization/context/nonblocking`	`6679.4` ns	`6670.8` ns	`1.00`
`cuda/synchronization/context/blocking`	`916.921052631579` ns	`928` ns	`0.99`

This comment was automatically generated by workflow using github-action-benchmark.

maleadt added needs tests Tests are requested. bugfix This gets something working again. labels Aug 28, 2024

kose-y changed the title ~~Bugfix for gemv_strided_batched!~~ Bugfix for batched gemv Aug 29, 2024

maleadt force-pushed the patch-1 branch from 8c1e036 to 68e4cdf Compare September 18, 2024 08:37

maleadt added needs changes Changes are needed. and removed needs tests Tests are requested. labels Sep 18, 2024

maleadt marked this pull request as draft September 18, 2024 10:20

maleadt force-pushed the master branch 9 times, most recently from 7ec9719 to c5571cf Compare December 19, 2024 17:56

maleadt force-pushed the master branch 6 times, most recently from 5d585c4 to c850163 Compare December 20, 2024 08:18

fix and add test for gemv_batched! when the matrix is transposed

0dd2249

kose-y force-pushed the patch-1 branch from 68e4cdf to 0dd2249 Compare February 3, 2025 02:37

kose-y commented Feb 3, 2025

View reviewed changes

amontoison marked this pull request as ready for review February 3, 2025 03:49

amontoison removed the needs changes Changes are needed. label Feb 3, 2025

make suggested format change

2e316ac

amontoison approved these changes Feb 3, 2025

View reviewed changes

github-actions bot reviewed Feb 3, 2025

View reviewed changes

maleadt merged commit 86b255e into JuliaGPU:master Feb 3, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix for batched gemv #2481

Bugfix for batched gemv #2481

kose-y commented Aug 28, 2024

maleadt commented Aug 28, 2024

kose-y commented Aug 28, 2024

kose-y commented Aug 28, 2024

kose-y commented Aug 28, 2024 •

edited

Loading

kose-y commented Aug 29, 2024

maleadt commented Aug 29, 2024

kose-y commented Sep 9, 2024

lpawela commented Sep 9, 2024

lpawela commented Sep 11, 2024

maleadt commented Sep 12, 2024 •

edited

Loading

maleadt commented Sep 17, 2024

maleadt commented Sep 18, 2024

lpawela commented Sep 18, 2024

maleadt commented Sep 18, 2024 •

edited

Loading

amontoison commented Feb 3, 2025

kose-y Feb 3, 2025

kose-y commented Feb 3, 2025 •

edited

Loading

codecov bot commented Feb 3, 2025 •

edited

Loading

github-actions bot commented Feb 3, 2025 •

edited

Loading

amontoison commented Feb 3, 2025 •

edited

Loading

kose-y commented Feb 3, 2025

github-actions bot left a comment

Bugfix for batched gemv #2481

Bugfix for batched gemv #2481

Conversation

kose-y commented Aug 28, 2024

maleadt commented Aug 28, 2024

kose-y commented Aug 28, 2024

kose-y commented Aug 28, 2024

kose-y commented Aug 28, 2024 • edited Loading

kose-y commented Aug 29, 2024

maleadt commented Aug 29, 2024

kose-y commented Sep 9, 2024

lpawela commented Sep 9, 2024

lpawela commented Sep 11, 2024

maleadt commented Sep 12, 2024 • edited Loading

maleadt commented Sep 17, 2024

maleadt commented Sep 18, 2024

lpawela commented Sep 18, 2024

maleadt commented Sep 18, 2024 • edited Loading

amontoison commented Feb 3, 2025

kose-y Feb 3, 2025

Choose a reason for hiding this comment

kose-y commented Feb 3, 2025 • edited Loading

codecov bot commented Feb 3, 2025 • edited Loading

Codecov Report

github-actions bot commented Feb 3, 2025 • edited Loading

amontoison commented Feb 3, 2025 • edited Loading

kose-y commented Feb 3, 2025

github-actions bot left a comment

Choose a reason for hiding this comment

CUDA.jl Benchmarks

kose-y commented Aug 28, 2024 •

edited

Loading

maleadt commented Sep 12, 2024 •

edited

Loading

maleadt commented Sep 18, 2024 •

edited

Loading

kose-y commented Feb 3, 2025 •

edited

Loading

codecov bot commented Feb 3, 2025 •

edited

Loading

github-actions bot commented Feb 3, 2025 •

edited

Loading

amontoison commented Feb 3, 2025 •

edited

Loading